count-min sketch
A Bayesian Nonparametric View on Count-Min Sketch
The count-min sketch is a time-and memory-efficient randomized data structure that provides a point estimate of the number of times an item has appeared in a data stream. The count-min sketch and related hash-based data structures are ubiquitous in systems that must track frequencies of data such as URLs, IP addresses, and language n-grams. We present a Bayesian view on the count-min sketch, using the same data structure, but providing a posterior distribution over the frequencies that characterizes the uncertainty arising from the hash-based approximation. In particular, we take a nonparametric approach and consider tokens generated from a Dirichlet process (DP) random measure, which allows for an unbounded number of unique tokens. Using properties of the DP, we show that it is possible to straightforwardly compute posterior marginals of the unknown true counts and that the modes of these marginals recover the count-min sketch estimator, inheriting the associated probabilistic guarantees. Using simulated data with known ground truth, we investigate the properties of these estimators. Lastly, we also study a modified problem in which the observation stream consists of collections of tokens (i.e., documents) arising from a random measure drawn from a stable beta process, which allows for power law scaling behavior in the number of unique tokens.
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
- North America > United States > Texas > Harris County > Houston (0.04)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- (2 more...)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Extreme Classification in Log Memory using Count-Min Sketch: A Case Study of Amazon Search with 50M Products
In the last decade, it has been shown that many hard AI tasks, especially in NLP, can be naturally modeled as extreme classification problems leading to improved precision. However, such models are prohibitively expensive to train due to the memory bottleneck in the last layer. For example, a reasonable softmax layer for the dataset of interest in this paper can easily reach well beyond 100 billion parameters (> 400 GB memory). To alleviate this problem, we present Merged-Average Classifiers via Hashing (MACH), a generic $K$-classification algorithm where memory provably scales at $O(\log K)$ without any assumption on the relation between classes. MACH is subtly a count-min sketch structure in disguise, which uses universal hashing to reduce classification with a large number of classes to few embarrassingly parallel and independent classification tasks with a small (constant) number of classes.
A Bayesian Nonparametric View on Count-Min Sketch
The count-min sketch is a time-and memory-efficient randomized data structure that provides a point estimate of the number of times an item has appeared in a data stream. The count-min sketch and related hash-based data structures are ubiquitous in systems that must track frequencies of data such as URLs, IP addresses, and language n-grams. We present a Bayesian view on the count-min sketch, using the same data structure, but providing a posterior distribution over the frequencies that characterizes the uncertainty arising from the hash-based approximation. In particular, we take a nonparametric approach and consider tokens generated from a Dirichlet process (DP) random measure, which allows for an unbounded number of unique tokens. Using properties of the DP, we show that it is possible to straightforwardly compute posterior marginals of the unknown true counts and that the modes of these marginals recover the count-min sketch estimator, inheriting the associated probabilistic guarantees. Using simulated data with known ground truth, we investigate the properties of these estimators. Lastly, we also study a modified problem in which the observation stream consists of collections of tokens (i.e., documents) arising from a random measure drawn from a stable beta process, which allows for power law scaling behavior in the number of unique tokens.
- Asia > Afghanistan > Parwan Province > Charikar (0.04)
- North America > Canada > Quebec > Montreal (0.04)
- Asia > Middle East > Jordan (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
- North America > United States > Texas > Harris County > Houston (0.04)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- (2 more...)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Adaptive-GraphSketch: Real-Time Edge Anomaly Detection via Multi-Layer Tensor Sketching and Temporal Decay
Ekle, Ocheme Anthony, Eberle, William
Anomaly detection in dynamic graphs is essential for identifying malicious activities, fraud, and unexpected behaviors in real-world systems such as cybersecurity and power grids. However, existing approaches struggle with scalability, probabilistic interpretability, and adaptability to evolving traffic patterns. In this paper, we propose ADAPTIVE-GRAPHSKETCH, a lightweight and scalable framework for real-time anomaly detection in streaming edge data. Our method integrates temporal multi-tensor sketching with Count-Min Sketch using Conservative Update (CMS-CU) to compactly track edge frequency patterns with bounded memory, while mitigating hash collision issues. We incorporate Bayesian inference for probabilistic anomaly scoring and apply Exponentially Weighted Moving Average (EWMA) for adaptive thresholding tuned to burst intensity. Extensive experiments on four real-world intrusion detection datasets demonstrate that ADAPTIVE-GRAPHSKETCH outperforms state-of-the-art baselines such as ANOEDGE-G/L, MIDAS-R, and F-FADE, achieving up to 6.5% AUC gain on CIC-IDS2018 and up to 15.6% on CIC-DDoS2019, while processing 20 million edges in under 3.4 seconds using only 10 hash functions. Our results show that ADAPTIVE-GRAPHSKETCH is practical and effective for fast, accurate anomaly detection in large-scale streaming graphs. Keywords: Anomaly Detection, Streaming, Real-time, Dynamic Graphs, Edge Streams, Tensor Sketching
- North America > United States > Tennessee (0.04)
- Europe > France (0.04)
- Information Technology > Security & Privacy (1.00)
- Government > Military > Cyberwarfare (0.34)
- Information Technology > Data Science > Data Mining > Anomaly Detection (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.94)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.88)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- North America > Canada (0.04)
- Media > Film (0.47)
- Leisure & Entertainment (0.47)
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- North America > United States > California > Santa Clara County > Sunnyvale (0.04)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
- Asia > China > Beijing > Beijing (0.04)
Reviews: Extreme Classification in Log Memory using Count-Min Sketch: A Case Study of Amazon Search with 50M Products
This paper studies the task of extreme classification with a large amount of target categories. It developed a hashing-based algorithm, MACH. Then a classifier is trained and applied for each hash mapping, on the reduced problem with much smaller amount of target classes. The prediction results of the sub-classifiers are then combined to re-constructed the final output. The proposed methods are demonstrated to be both efficient and effective in multiple datasets.